Skip to content

Cluster #17

Merged
merged 14 commits into from
Dec 21, 2018
Merged

Cluster #17

merged 14 commits into from
Dec 21, 2018

Conversation

HendrikSchultheis
Copy link
Collaborator

  • renamed reduce_bed to reduce_sequence (@renewiegandt please check if I forgot to change anything in nextflow)
  • Better error messages + documentation
  • Both scripts (reduce_sequence & cdhit_wrapper) now check if there is a header provided with the data. If this is the case the header is forwarded to the output.
  • reduce_sequence checks whether jellyfish is installed (only on linux)
  • cdhit_wrapper checks whether cd-hit is installed (only on linux)

So far the check whether the required tools are installed only works on linux. For now this should be enough but tell me if I should look into making this platform independent.

@HendrikSchultheis HendrikSchultheis mentioned this pull request Dec 19, 2018
35 tasks
#' @return reduced bed
#' TODO check whether jellyfish is installed
reduce_bed <- function(input, kmer = 10, motif = 10, output = "reduced.bed", threads = NULL, clean = TRUE, minoverlap_kmer = kmer - 1, minoverlap_motif = ceiling(motif / 2), min_seq_length = max(c(motif, kmer)), motif_occurence = 1) {
#' @details If there is a header supplied other then the default data.table nameing scheme ('V1', 'V2', etc.) it will be kept.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: naming scheme

#' @param threads Number of threads. Default = 1. 0 for all cores.
#' @param input Input bed-file. Last column must be sequences.
#' @param kmer Kmer length. Default = 10
#' @param motif Estimated motif length. Default = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter 'motif' does not contain a lot of information about its function.
Maybe it should be named something like 'estimated_motif_len'?

#' @param output Output file
#' @param threads Number of threads. Default = 1. 0 for all cores.
#' @param input Input bed-file. Last column must be sequences.
#' @param kmer Kmer length. Default = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kmer_len instead of kmer?

#' @param clean Delete all temporary files.
#' @param minoverlap_kmer Minimum required overlap between kmer to merge kmer. Used to create reduced sequence ranges. Can not be greater than kmer length. Default = kmer - 1
#' @param minoverlap_kmer Minimum required overlap between kmer. Used to create reduced sequence ranges out of merged kmer. Can not be greater than kmer length . Default = kmer - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

  • length.
  • k-mers

@@ -9,32 +9,41 @@ option_list <- list(
make_option(opt_str = c("-t", "--threads"), default = 1, help = "Number of threads to use. Use 0 for all available cores. Default = %default", metavar = "integer"),
make_option(opt_str = c("-c", "--clean"), default = TRUE, help = "Delete all temporary files. Default = %default", metavar = "logical"),
make_option(opt_str = c("-s", "--min_seq_length"), default = NULL, help = "Remove sequences below this length. Defaults to the maximum value of motif and kmer and can not be lower.", metavar = "integer", type = "integer"),
make_option(opt_str = c("-n", "--minoverlap_kmer"), default = NULL, help = "Minimum required overlap between kmer to merge kmer. Used to create reduced sequence ranges. Can not be greater than kmer length. Default = kmer - 1", metavar = "integer", type = "integer"),
make_option(opt_str = c("-n", "--minoverlap_kmer"), default = NULL, help = "Minimum required overlap between kmer. Used to create reduced sequence ranges out of merged kmer. Can not be greater than kmer length. Default = kmer - 1", metavar = "integer", type = "integer"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: k-mers. 2x

@@ -68,24 +69,33 @@ opt <- parse_args(opt_parser)
#' @param gat_ext Gap extension score. Default = -1 (CD-HIT parameter)
#' @param sort_cluster_by_size Either sort cluster by decreasing length (= 0) or by decreasing size (= 1). Default = 1 (CD-HIT parameter)
#'
#' TODO check whether cdhit is installed
#' @details If there is a header supplied other then the default data.table nameing scheme ('V1', 'V2', etc.) it will be kept and extended.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: naming scheme

pipeline.nf Outdated
@@ -268,7 +268,7 @@ process overlap_with_known_TFBS {

/*
*/
process reduce_bed {
process reduce_sequence {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a small description to the process reduce_sequence and clustering.

Copy link
Collaborator

@renewiegandt renewiegandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few typos and made a few suggestions for parameter names. Apperently 'kmer' is writen k-mer you should take a look at that.
You could also add a small description for the processes reduce_sequences and clustering.
The changes in the code functionality look good to me.

@HendrikSchultheis HendrikSchultheis self-assigned this Dec 20, 2018
@HendrikSchultheis HendrikSchultheis added the enhancement New feature or request label Dec 20, 2018
@HendrikSchultheis HendrikSchultheis merged commit 935ba3f into dev Dec 21, 2018
Sign in to join this conversation on GitHub.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants